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Instantaneous  Control  of  Brownian  Motion 

3.  Michael  Harrison 
Michael  I.  Taksar 

Stanford  University 

Abstract 

A  controller  continuously  monitors  a  storage  system,  such  as  an 
Inventory  or  bank  account,  whose  content  Z  =  {Zt,  t^O}  fluctuates 
as  a  (p,o^)  Brownian  motion  in  the  absence  of  control.  Holding 
costs  are  incurred  continuously  at  rate  h(Z{-).  At  any  time,  the 
controller  may  instantaneously  increase  the  content  of  the  system, 
incurring  a  proporltional  cost  of  r  times  the  size  of  the  increase, 
or  decrease  the  content  at  a  cost  of  1  times  the  size  of  the 
decrease.  We  consider  the  case  where  h  is  convex  on  a  finite 
interval  [a,fl]  and  h  =  «>  outside  this  interval.  The  objective  is 
to  minimize  the  expected  discounted  sum  of  holding  costs  and  control 
costs  over  an  infinite  planning  horizon. 

It  is  shown  that  there  exists  an  optimal  control  limit  policy, 
characterized  by  two  parameters  a  and  b  (a_<a<b£B).  Roughly 
speaking,  this  policy  exerts  the  minimum  amounts  of  control  sufficient 
to  keep  Z ^  c  [a,b]  for  all  t  >  0.  Put  another  way,  the  optimal 
control  limit  policy  imposes  on  Z  a  lower  reflecting  barrier  at  a 
and  an  upper  reflecting  barrier  at  b.  We  do  not  give  a  full-blown 

algorithm  for  construction  of  the  optimal  control  limits,  but  a 

computational  scheme  could  easily  be  developed  from  our  constructive 

proof  of  existence.  .ion/ 
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Instantaneous  Control  of  Brownian  Motion 

3.  Michael  Harrison 
Michael  I.  Taksar 

Stanford  University 


1.  Introduction 

Consider  a  controller  who  continuously  monitors  the  content  of  a 
storage  system,  such  as  an  inventory  or  bank  account.  In  the  absence 
of  any  control,  the  content  process  Z  =  { Zt ,  t  ^  0}  fluctuates  as 
a  Brownian  Motion  with  drift  p  and  variance  a2,  and  holding  costs 
are  continuously  incurred  at  rate  h(Z^) .  In  order  to  avoid 
excessive  holding  costs,  the  controller  may  at  any  time  increase  the 
content  of  the  system  by  any  amount  desired,  incurring  a  proportional 
cost  of  r  times  the  size  of  the  increase.  Similarly,  he  may 
decrease  the  content  by  any  amount  desired,  incurring  a  proportional 
cost  of  JL  times  the  size  of  the  decrease.  Hereafter  we  use  the  term 
pushing  right  to  mean  increasing  the  content  of  the  system,  and 
pushing  left  to  mean  decreasing  the  content.  Thus  r  and  I  are  the 
proportional  control  costs  associated  with  pushing  right  and  pushing 
left  respectively. 

The  controller's  objective  is  to  find  a  policy  that  minimizes  the 
expected  discounted  sum  of  holding  costs  and  control  costs  over  an 
infinite  planning  horizon,  where  future  costs  are  continuously 
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discounted  at  interest  rate  y  >  0-  To  formulate  this  problem  in 
precise  mathematical  terms,  we  begin  with  a  (g,c2)  Brownian  motion 
X  =  {Xt,  t  _>  0)*  denoting  by  the  distribution  on  the  path  space  of 
X  corresponding  to  initial  state  x.  A  policy  is  defined  as  a  pair 
of  nonnegative  processes  R  =  {Rt>  t  0)  and  L  =  (L^,  t  0)  that 

are  non-decreasing  and  non-anticipating  with  respect  to  X.  Interpret 
R^  and  Lt  as  the  cumulative  amounts  of  rightward  movement  and 
leftward  movement,  respectively,  effected  by  the  controller  over  the 
time  interval  [0,t].  The  content  process  under  policy  (R,L)  is 

Zt  =  1  >  0  ’ 

and  we  define  the  associated  cost  function 

OD  00  CO 

k(x)  =  E  f/  e"rt  h(Z  )dt  +  r  (  e'yt  dR.  +■  t  (  e‘vt  dL  1  , 
x  0  z  0  *  0  1 

with  the  Rlemann-Stieltjes  integrals  on  the  right  defined  to  include 
the  control  costs  rR0  and  ILo  incurred  at  t  =  0  (see  53).  Our 
objective  is  to  find  a  policy  which  minimizes  k(x)  for  every 
starting  state  x. 

An  essential  feature  of  this  problem  is  that  the  controller  can 
Instantaneously  change  the  content  (or  state)  of  the  storage  system. 
Thus,  it  is  possible  to  further  impose  state  constraints  on  the 
controller's  actions,  which  may  be  formally  expressed  by  setting 
h(x)  =  •  for  some  states  x.  In  the  same  way,  one  of  the 
controller's  options  may  be  eliminated  by  setting  r  =  «  or  1  =  •. 
The  special  case  where  r  <  •,  t  <  »  and 
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X 


if  x  >  0 


(1.1)  h(x)  =  l 

V  »  if  x  <  0 

was  studied  by  Harrison  and  Taylor  [6].  Defining  b  as  the  unique 
solution  of  a  certain  transcendental  equation,  they  proved  the 
optimality  of  a  control  limit  policy  (R*,L*)  with  lower  limit  zero 
and  upper  limit  b.  The  policy  ( R*,L *)  and  its  associated  content 
process  Z*  =  X+R*-L*  may  be  described  as  follows.  If  X0  <  0,  one 
takes  Rg  =  -X0,  so  that  Zg  =  0.  If  X0  >  b,  one  takes  Lg  =  X0-b, 
so  that  Zg  =  b.  After  time  zero,  one  increases  R*  and  L*  in  the 
minimal  amounts  sufficient  to  achieve  0  <  Z*  £  b  for  all  t  ^  0. 
Under  this  plan,  Z*  is  a  (^, a2)  Brownian  motion  with  a  lower 

reflecting  barrier  at  zero  and  an  upper  reflecting  barrier  at  b, 

#  #  *  * 

R  -Hg  is  the  local  time  of  Z*  at  zero,  and  L  -L0  is  the  local 

time  of  Z*  at  b.  In  particular,  although  R*  and  L*  may  have 

jumps  at  t  =  0,  they  are  continuous  but  singular  thereafter.  The 

last  phrase  means  that  the  set  of  time  points  at  which  R*  or  L* 

increases  has  zero  Lebesgue  meansure  (almost  surely).  This 

singularity  expresses  a  bang-bang  property  of  the  optimal  policy  with 

instantaneous  control. 

In  this  paper  we  consider  the  instantaneous  control  problem  with 
r  <  •,  Jt  <  »  and  a  holding  cost  function  of  the  form 
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t 

a  general  convex  function  , 


x  <  a 
a  —  x  —  P 
x  >  R 


where  -•  <  a  <  0  <  <».  It  will  be  shown  that  there  exists  an  optimal 
control  limit  policy  with  lower  limit  a  and  upper  limit  b,  where 
a  £  a  <  b  £  p.  We  do  not  present  a  full-blown  algorithm  for  computa¬ 
tion  of  the  optimal  control  limits  a  and  b,  but  a  computational 
scheme  could  easily  be  developed  from  our  constructive  proof  of  exis¬ 
tence.  Our  treatment  generalizes  the  result  by  Harrison  and  Taylor 
[6]  described  earlier,  and  the  methods  used  here  are  also  more  elegant 
and  more  general  In  their  applicability.  This  improvement  in  method¬ 
ology  and  presentation  has  itself  been  a  major  goal  in  our  study, 
although  the  extension  to  general  convex  holding  costs  is  potentially 
important  for  applications. 

It  will  ultimately  be  found  that  the  minimal  cost  function  f 
for  our  Instantaneous  control  problem  satsifies  the  optimality 
equation  (or  Bellman  equation) 


(1.2)  0  =  trf(x)  -  yf  (x)  +  h(x)  ]  A  [f(x)+r]  a  [f'(x)-Jl]  . 


where 


r 


2 

2 


t 


is  the  infinitesimal  generator  of  the  Brownian  motion  X.  Note  that 
(1.2)  Imposes  three  differential  inequalities  on  f,  plus  the 
requirement  that  at  least  one  of  the  inequalities  be  tight  in  each 
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state  x.  For  those  familiar  with  standard  stochastic  control  theory, 
(1.2)  has  a  strange  appearance,  so  we  shall  begin  in  $2  with  a 
heuristic  derivation  of  this  optimality  equation.  A  precise 
mathematical  formation  of  the  instantaneous  control  problem  is  then 
given  in  $3.  Using  the  ubiquitous  change  of  variable  formula  (or 
generalized  Ito  Formula)  for  semimartingales,  we  show  in  $4  that  any 
smooth  solution  f  of  (1.2)  satisfies  f  _<  k  for  all  cost  functions 
k  associated  with  feasible  policies.  If  one  can  find  a  policy  whose 
cost  function  satisfies  the  optimality  equation,  it  of  course  follows 
that  this  policy  is  optimal.  The  cost  function  for  a  general  control 
limit  policy  is  computed  in  $5,  and  then  in  §6  we  show  how  to  choose 
the  control  limits  so  that  our  optimality  equation  is  satisfied  by  the 
associated  cost  function.  Finally,  $7  discusses  applications  of  the 
instantaneous  control  problem  and  some  correlative  references. 


2.  Heuristic  Derivation  of  the  Optimality  Equation 

To  simplify  discussion,  we  assume  in  this  section  that  h(x)  <  *» 
for  all  x  cF  (the  real  line),  and  the  letters  a  and  ft  will  be 
used  here  with  new  (temporary)  meanings.  Suppose  that,  in  the 
stochastic  control  problem  described  in  Si,  the  controller  can  only 
push  right  or  left  at  a  rate  which  is  not  to  exceed  ft  <  •.  We  then 
have  a  more  or  less  standard  stochastic  control  problem,  which  can  be 
stated  in  the  following  unconventional  form.  A  policy  is  a  pair  of 
processes  (R,L)  having  the  form 
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(2.1) 


t  >  0  , 


t  t 

R  =  f  a  ds  and  L  =  f  P  ds  , 
t  q  s  t  0  s 

where  a  and  8  are  non-anticipating  with  respect  to  X  and  satisfy 
0  <  a,  P  _<  8.  The  content  process  under  policy  (R,L)  is 
2  =  X^-fR^-L^,  and  we  define  the  associated  cost  function  k(x)  as  in 
§1.  Let  f(x)  be  the  pointwise  infimum  of  all  such  cost  functions 
(the  mimimal  cost  function).  Under  mild  assumptions  on  h  it  can  be 
shown  that  f  is  twice  continuously  differentiable  and  satisfies  the 
optimality  equation 

(2.2)  0  =  influx)  -  yf  (x)  +  (a-R)f'(x)  +  rot  +  *R  +  h(x)> 

=  inf { (Tf-yf^h) (x)  +  a[r+f'(x)]  +  R[f-f'(x)]}  , 

where  the  infimum  is  taken  over  all  real  numbers  a, P  e  [0,0]. 

Problems  of  this  type,  where  the  controller  has  the  ability  to  alter 
the  drift  of  a  diffusion  process  at  some  cost,  have  been  treated  by 
Mandl  [9],  Krylov  [8],  Fleming  and  Rishel  [4],  Gihman  and  Skorohod 
[5],  and  a  number  of  others. 

Since  f  e  #2(F),  the  infimum  in  (2.2)  is  attained,  and  the 
minimizing  values  for  a  and  P  are 

a*(x)  =  01^ (x)  where  A  =  {x  e  R:  r+f'(x)  <  0)  , 

P*(x)  =  61g(x)  where  B  =  (x  e  Rs  i.- f'(x)  £0)  . 
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This  is  a  bang-bang  policy.  In  each  state  x,  each  available  control 
mode  (pushing  right  and  pushing  left)  is  employed  at  either  the 
maximum  possible  intensity  fl  or  else  at  the  minimum  intensity  of 
zero.  The  content  process  Z*  associated  with  the  optimal  policy 
satisfies  the  stochastic  differential  equation 

t  t 

(2.3)  Z*  =  +  f  a*(Z*)ds  -  f  fl*(Z*)ds 

t  t  Q  t  0  s 

t  t 

=  X.  +  P  f  1* (Z*)ds  -  ft  f  1n(Z*)ds  , 

t  o  A  s  6  0  s 

the  optimal  policy  (R*,L*)  being  given  by  the  last  two  terms  on  the 
right  side  of  (2.3).  Nothing  said  so  far  depends  on  any  particular 
structure  of  h.  Assuming  that  h  is  convex  with  h(x)  ♦  ®  as 
| x |  -*■  ®,  it  can  be  shown  that  f  is  convex  itself,  with  f(x)  ®  as 
| x |  ”,  so  that  A  =  (-®,  a]  and  B  =  [b,®)  for  some  parameters  a 

and  b  (-®<a<b<«). 

Letting  Q  -*■  ®  in  an  attempt  to  approximate  the  instantaneous 
control  problem  of  §1,  this  suggests  the  optimality  of  a  control  lir'‘ 
policy.  Starting  from  any  state  x,  we  should  either  apply  no  control 
at  all  initially,  or  push  right  at  the  maximum  possible  rate,  or  else 
push  left  at  the  maximum  rate.  In  the  limiting  problem,  the  latter 
two  actions  amount  to  instantaneous  (jump)  displacement,  either  right 
or  left.  With  this  motivation,  we  now  use  an  infinitesimal  argument 
to  derive  the  optimality  equation  with  instantaneous  control,  con¬ 
sidering  infinitesimal  elements  of  space  rather  than  time. 


Let  f  be  the  minimal  cost  function  for  the  instantaneous 


control  problem,  fix  a  starting  state  x,  and  consider  a  small 
surrounding  interval  [x-e,  x+e].  The  preceeding  discussion  suggests 
that  we  should  either  junp  immediately  to  x+e  and  proceed  optimally 
from  there,  jump  immediately  to  x-e  and  proceed  optimally  from 
there, or  else  apply  no  control  up  to  time 

T(e)  =  inf{  t  2  Os  |Xt~Xol  =  » 

and  proceed  optimally  thereafter.  Under  the  first  option,  our  total 
expected  discounted  cost  is 

(2.4)  re  +  f(x+e)  =  f(x)  +  [r  +  f'(x)]e  +  o(e)  , 


under  the  second  it  is 


(2.5)  Jte  +  f(x-e)  =  f  (x )  +  [Jt-f’(x)]e  +  o(  e)  , 

and  under  the  last  it  is 

T(  e)  .  T,  . 

(2.6)  Ex[/  e  *Yt  h(Xt)dt  +  e"v  ' f(XT(c))] 

=  f(x)  +  [rf(x)  -  Yf(x)  +  h(x)]  Ex[T(e)]  +  o(Ex[T(e)]) 
=  f(x)  +  [ff(x)  -  yf(x)  +  h(x)]  (e/o)2  +  o(e2)  . 
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In  writing  (2.6),  we  have  used  the  fact  that 


Ex[f(XT(E))  '  f<*>3/Ex[T(e)]  ♦  ^(x)  as  e  4-  0  , 

and  that  a2E x[T(e)]/e2  +  1  as  e  +  0.  Now  to  minimize  our  expected 
discounted  cost  starting  from  x,  we  want  to  take  the  smallest  of 
(2.4)  -  (2.6),  meaning  that 

(2.7)  f(x)  =  minff(x)  +  [T’f(x)  -  vf(x)  +  h(x)]  (e/a)2  +  o(e2)  , 

f(x)  +  [r  +■  f'(x)]e  +  o(e)  , 

f(x)  +  [*  -  f'(x)]e  +  o(e)}. 


Substracting  f(x)  from  both  sides  of  (2.7),  and  letting  e  +  0,  we 
conclude  that 


0  =  [rf(x)  -  yf(x)  +  h(x)]  a  [r  +  f'(x)]  a  [t  -  f'(x)]  , 


which  is  precisely  the  optimality  equation  (1.2). 
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3.  Problem  Formulation 

The  data  for  our  problem  are  a  drift  parameter  p,  a  variance 
parameter  a  >  0,  control  cost  parameters  r  and  f,  an  interest 
rate  y  >  0,  a  compact  state  space  S  =  [at,fi]  and  a  convex  holding 
cost  function  h:S  ♦  F  .  We  assume  r+i  >  0,  for  otherwise  the 
control  problem  would  make  no  sense. 

Let  0  be  the  space  of  all  continuous  functions  u>:  [0,®)  +  F, 
which  is  usually  denoted  C[0,®).  Let  X^:  P  +  F  be  the  coordinate 
projection  mapping  Xt(w)  =  w(t),  t  ^  0  and  w  e  P.  Then  X  =  (Xt, 
t  _>  0)  is  simply  the  identity  map  0  -*•  0.  Let  &  =  cr(Xt,  t  >_  0) 
denote  the  smallest  o-field  on  0  such  that  X^  is  -measurable 
for  each  t  ^  0,  and  similarly  let  <£F.  =  <r(Xs,  0  <_  s  <_  t).  Finally, 
for  each  x  e  S,  let  be  the  unique  probability  measure  on  (Q, &) 

such  that  X  is  a  Brownian  motion  with  drift  p,  variance  o2  and 
starting  state  x  under  Px,  and  let  Ex  be  the  associated 
expectation  operator.  A  policy  is  defined  as  a  pair  of  processes 
R  =  (Rt,  t  ^  0)  and  L  =  (Lfc,  t  _>  0)  such  that 

(3.1)  R(w)  and  L(w)  are  right-continuous,  non-negative 

and  non-decreasing  for  all  u  e  P,  and 

(3.2)  Rfc  and  Lfc  and  JF. -measurable  for  all  t  ^  0. 

As  usual,  we  summarize  (3.2)  by  saying  that  R  and  L  are  adapted  to 

We  associate  with  policy  (R,L)  the  controlled  process  Z  = 
X+R-L,  and  we  say  that  (R,L)  is  a  feasible  policy  if 
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(3.3)  Px(Zt  c  S  for  all  t  >  0 )  =  1  for  all  x  c  S  , 

CD 

(3.4)  E  [f  e-Yt  dR  1  <  -  for  all  x  e  S  , 

x  o  t 

and 

CD 

(3.3)  E  ff  e'Yt  dL.1  <  -  for  all  x  e  S  . 

x  o  z 

The  Integrals  in  (3.4)  and  (3.5)  are  defined  for  each  fixed  w  in  the 
ordinary  Lebesgue-Stieltjes  sense  over  [0,®),  with  the  convention 
that  R  =  =  0  for  t  <  0.  Thus  the  integral  in  (3.4),  for 

example,  equals  the  sum  of  Rg  and  an  integral  over  (0,®).  This 
same  notational  convention  will  be  used  later  without  comment.  We 
associate  with  a  feasible  policy  (R,L)  the  cost  function 

CO 

k(x)  =  E  {f  e"Yt[h(Z  )dt  +  rdR.  +  JWL. ])  ,  x  e  S  , 
x  0  z  z  z 

and  (R,L)  is  said  to  be  optimal  if  k(x)  is  minimal  (among  the 
cost  functions  for  feasible  policies)  for  each  x  c  S.  By  defining 
feasibility  via  (3.3)-(3.5)  we  are  implicitly  setting  h  =  ®  outside 
S  and  then  restricting  attention  to  policies  that  have  finite 
expected  discounted  cost  for  every  starting  state  x.  We  could  still 
enrich  our  definitional  system  to  include  all  possible  starting  states 
x  e  R,  but  for  x  lying  outside  S,  the  feasibility  restriction 

(3.5)  would  obviously  require  that  either  R  or  L  have  a  Jump  at 
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t  =  0  so  as  to  ensure  Zq  e  S.  By  restricting  attention  to 
starting  states  x  e  S  we  avoid  some  irritatinq  complications  without 
any  significant  loss  of  generality. 

This  is  the  most  concrete  possible  formulation  of  the  decision 
problem  described  informally  in  §1.  By  takinq  o  =  C[0,®)  and 
X(u>)  =  to,  we  formally  express  the  fact  that  our  decision-maker 
observes  nothing  of  relevance  other  than  the  sample  path  of  X,  and 
(3.2)  expresses  the  requirement  that  his  actions  over  the  time 
interval  [0,t]  depend  only  on  the  observed  values  of  Xs,  0 _<  s  <_  t. 


4.  An  Application  of  the  Generalized  Ito  Formula 

Until  further  notice,  let  (R,l)  be  a  fixed  feasible  policy  and 
x  e  S  a  fixed  initial  state.  In  the  usual  way,  we  denote  by  AR^ 

=  R^-Rj.  the  jump  of  R  at  time  t,  recalling  that  a  right -continuous 
function  with  finite  left  limits  can  have  only  countably  many  points 
of  discontinuity.  As  in  §3,  we  take  R^  =  0  for  t  <  0,  so  that 
AR0  =  Rq.  The  same  convention  is  used  in  extending  the  definition  of 
Al_t  to  t  =  0.  It  will  be  convenient  to  denote  by  p  and  X  the 
continuous  parts  of  R  and  L  respectively,  meaning  that 


(4.1) 


J 

0<s<t 


AR 


s 


and  X  =  L.  -  J  AL  , 
r  c  0<s<t  s 
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for  t  ^  0.  Obviously  p  and  X  are  continuous  and  non-decreasing 
with  po  =  Xo  =  0.  As  before,  let  Z  =  X+R-L. 

Now  fix  f  e  <g’2(S)  and  denote  by  f(Z)  the  process  (f(Zfc), 
t  2.  0) •  Then  Af(Z)^  =  f(Z^)  -  f(Z^_)  and  we  extend  this  to  t  =  0 
with  the  useful  convention 

Af(Z)0  H  f(Z0)  -  f < X o )  . 

Finally,  let  T  =  1/2  a2  52/9x2  +  p  9/9x  as  in  §1,  and  let  T  >  0  be 
fixed. 


(4.2)  Proposition.  With  the  assumptions  and  definitions  above, 
E  [e“YTf(Z  )]  =  f(x)  +  E  r/T  e'YT(rf-Yf)  (Z. )dtl 

A  1  *  n  * 


+  Ejf  e'Yt  f’(Zt)  d(p-X)tl 


0 


+  Ej  l  e"Yt  Af(Z).l 
0<t<T  z 


Remark.  From  (3.3)  -  (3.5)  and  the  fact  that  f  c  S?2(S)  it  follows 
that  all  of  the  expectations  appearing  in  (4.2)  exist  and  are  finite. 


Proof.  This  is  a  direct  application  of  the  change  of  variable  formula 
for  semimartingales,  but  to  make  connection  with  the  literature  on 
that  subject  we  must  enrich  our  set-up  slightly.  Remembering  that 
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x  e  S  has  been  fixed,  let  (0,  £ * ,  Px)  be  the  completion  of 
(fl,  d*-,  Px>,  cf.  Williams  [11,  p.  16],  and  for  each  t  ^  0  let 
be  formed  from  by  adding  to  it  all  A  e  dF*  such  that 

P  (A)  =  0.  It  is  well  known  that  the  filtration  (^"*,  t  ^  0)  is 
right  continuous,  and  hence  the  filtered  probability  space  (0,dF*, 

Px»  (,*F£,  t  2l  0))  satisfies  the  usual  conditions  imposed  by 
Meyer  [10]  in  his  treatment  of  stochastic  integration  and  the  change 
of  variable  formula.  Our  processes  X,  L,  R  and  Z  are  all  adapted 
to  (& t)  and  thus  also  to  (dF*).  ^en  we  use  the  terms  adapted, 
martingale,  stopping  time,  etc.,  later  in  this  proof,  the  underlying 
filtration  is  understood  to  be  (dF*). 

It  will  be  convenient  to  represent  X  in  the  form  Xfc  =  Xp+oW^+pt 
where  W  is  a  standard  Wiener  process  starting  at  zero.  Then 

(4.3)  Zfc  =  oWt  +  Vt  where  =  Xo+ftt-Lt+pt  . 

From  the  definitive  properties  of  R  and  L  we  see  that  V  is  a  VF 
(finite  variation)  process,  and  W  is  of  course  a  martingale,  so  Z 
is  a  semimartlngle.  (We  follow  Meyer  [10]  in  ail  of  our  terminology 
concerning  martingales  and  related  theory.)  Then  the  change  of 
variable  formula  (or  aeneralized  Ito  formula)  gives  us 

T  T 

(4.4)  f(ZT)  =  f(Z0)  +  f  f ' (Z  )dZ  +  -1  o2  f  f"(Z  )dt 

0  z~  1  *  0 

♦  J  [ Af ( Z )  -  f'(Z.  ) AZ  ]  , 

0<t£T  z  Z 

=  f(Z0)  +  I i ( T )  +  I2(T)  +  F(T)  , 
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cf.  Meyer  [10,  p.  301  ].  Here  I'j(T)  is  a  stochastic  integral  over 
(0,T],  and  we  have  simplified  the  general  form  of  I2CO  by  using 
the  fact  that  oW  is  the  so-called  continuous  martingale  part  of  Z 
and  <oW,  oW>t  =  Using  (4.1)  and  (4.3),  we  have 


T 

(4.5)  I ! (T )  =  f  f'(Z,.  )  (odW  +  dp.  -  dV  +  \xdt) 

q  L-  CCU 

+  l  f '  (Z.  )  AZ  . 

0<t£T  c 


Now  we  can  replace  f'(Z  )  by  f'(Z  )  in  the  integral  on  the  right 

C "  v 

side  of  (4.5),  because  the  integrator  is  continuous,  and  a  similar 
statement  holds  for  l2<T).  Thus,  substituting  (4.5)  into  (4.4)  and 
combining  similar  terms,  we  have 


T  T 

(4.6)  f(ZT)  =  f(Z0)  ♦  o  f  f‘(Z.)  dW.  +  f  f'(Z.)  d(p-X) 

1  0  1  z  0  1 


T 

♦  f  rf(Z  )dt  +  J  Af(Z  )  . 

0  *  0<t<T  1 


Now  let  Yt  =  exp(-yt),  t  0.  Because  Y  is  a  continuous  VF 
process,  the  general  integration  by  parts  formula  stated  on  page  303 
of  Meyer  [10]  simplifies  to  give  us  (in  this  equation,  square  brackets 
denote  quadratic  variation) 


T  T 

(4.7)  Yt  f(ZT)  =  Y0f(Z0)  +  f  Yt_df(Z)t  +  {  f(Zt_)dYt  ♦  [Y,f(Z)]y 


*  V(zo>  *  l  vtd,(z)t  *  l  ,<zt>  dVt  ’ 
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which  is  equivalent  to 


(4.8)  e'vT  f(ZT)  =  f(Z0)  +  (  e‘Yt  df(Z).  -  v  f  e'yt  f(Zt)dt  . 

1  0  r  0  t 


Now  we  calculate  df(Z)^  from  (4.4),  substitute  this  into  (4.8)  and 
collect  similar  terms  to  qet 


(4.9)  e"vT  f(ZT)  =  f(Z0)  +  a  f  e'Yt  f'(Z.)  dW. 

'  0  t  t 


♦  f  e'Yt  f'(Z.)  d(p-\).  +  f  e"Yt(rf-yf ) (Z  )dt 
0  1  Z  0  1 

*  J  e"yt  Af (Z)  . 

0<t<T  c 


Next,  because  Af(Z)o  =  f(Zo)  -  f(X0),  we  have 

(4.10)  f(Z0)  +  J  e’Yt  Af (Z).  =  f(X0)  +  J  e'Yt  Af(Z).  . 

0<UT  z  0<t<T 

We  now  substitute  (4.10)  into  (4.9)  and  take  Ex  of  both  sides, 
observing  that  the  Ito  inteqral  involving  dW^  has  zero  expectation 
because  its  integrand  is  bounded.  This  yields  equation  (4.2)  and  thus 
completes  the  proof. 

Maintaining  the  set-up  for  (4.2),  we  now  define  the  cumulative 
discounted  cost  process  associated  with  policy  (R,L).  Let 
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t  >  0  . 


t 

(4.11)  K.  =  f  e”vs[h(Z  )  ♦  rdR  ♦  idL  ]  , 

t  Q  s  s  s 

the  second  and  third  Integrals  being  defined  in  the  Lebesgue-Stieltjes 
sense  over  [0,t]  with  the  usual  convention  at  zero  (t =  R0  and 
ALo  =  Lq). 

(4.12)  Corollary.  With  the  assumptions  and  definitions  above, 

Ex[Kt  ♦  e"yT  f (ZT) ] 

T  . 

=  f ( x )  +  E  If  e‘YC(rf-vf+h)(Z  )dt) 

x  o  t 

T  . 

+  Elf  e“Yt  [r  ♦  f * (X  )]  dp  l 
x  o  z  z 

T 

+  E  (f  e'Yt[l  -  f'(Z.  )]d\. ) 
x  0  z  z 

+  E  f  J  e"Yt[Af(Z)  +  rAR.  +  *AL  ])  . 

*  °<t<T  z  z 

(4.13)  Remark.  For  future  reference,  we  express  the  right  side  of 
(4.12)  as  f ( x )  ♦  Ex[I1 (T)  +  I2(T)  +  I3(T)  ♦  r(T)]. 

Proof.  This  follows  immediately  from  (4.2)  and  (4.11),  using  the 
identities  dR  =  dp+AR  and  dL  =  dX+AL  in  (4.11). 
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For  an  interpretation  of  (4.12),  imagine  that  you  have 
responsibility  for  operating  the  storage  system  described  in  §1  and 
have  tentatively  decided  to  use  the  policy  (R,L).  Further  suppose 
that  another  person  offers  to  relieve  you  of  this  responsibility  under 
either  of  the  following  two  arrangements. 

(a)  You  may  pay  f(x)  dollars  at  time  zero  and  avoid  all  future 
control  and  holding  costs. 

(b)  You  may  employ  policy  (R,L)  up  to  time  T,  absorbing  the 
control  and  holding  costs  incurred  durinq  that  period,  then  make 
a  payment  of  f(Zj)  at  time  T  and  be  relieved  of  all  control 
responsibilities  thereafter. 

Corollary  (4.12)  gives  an  expression  for  your  expected  discounted  cost 


under  plan  (b). 

Now  suppose  that  f 

satisfies 

(4.14) 

Tf-yf+h  >  0 

on  S  , 

(4.15) 

r+f'  >  0 

on  S  , 

and 

(4.16) 

SL-f'  >  0 

on  S  . 

Using  the  notational  convention  (4.13),  it  is  clear  that  (4.14) 
implies  Ex[Ii(T)]  2  0,  (4.15)  implies  EX[I 2(T) ]  >^  0,  and  (4.16) 
implies  Ex[l3(T)]  _>  0.  Furthermore,  (4.15)  and  (4.16)  together 
imply  EX[E(T)]  j>  0  as  follows.  Suppose  AR(-  >  0  and  ALt  =  0. 
Then  AZ^  =  AR^  and  we  have 
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by  (4.15)  . 


Af ( Z ) t  +  rARt  +  *ALfc  =  f (Zfc )  -  f (Zt-ARt )  +  rARt 


Zt-AR 


t 


tf'(y)+r]dy  >  0 


* 


From  (4.16)  we  get  a  similar  inequality  for  times  t  where  AR^  =  0 
and  ALt  >  0.  Finally,  (4.15)  and  (4.16)  together  imply  a  similar 
inequality  for  times  t  with  AR^  >  0  and  AL^  >  0,  because  (and 
only  because)  we  have  assumed  r+i  >0.  So  we  find  that  (4.14)  - 

(4.16)  imply 

(4.17)  Ex[KT+e‘YT  f(ZT)]  >  f(x)  , 

which  means  that  plan  (b)  above  is  inferior  to  plan  (a)  for  any  choice 
of  T  (and  regardless  of  the  starting  state  x).  Letting  T  ®  in 

(4.17)  gives  k(x)  _>  f(x),  since  f  is  bounded  on  S.  Since  (R,L) 
and  x  were  arbitrary,  we  then  have  the  following. 

(4.18)  Corollary.  If  ft  ??2(S)  satslfies  (4.14)  -  (4.16),  then 
f  £  k  for  any  cost  function  k  associated  with  a  feasible  policy. 

Corollary  (4.18)  is  the  only  result  from  this  section  that  will 
be  used  later,  but  we  should  say  at  least  a  few  words  to  connect  our 
basic  identity  (4.12)  with  the  optimality  equation  (1.1)  and  the 


19 


"1 


general  notion  of  policy  improvement.  Suppose  that  f  Is  the  cost 
function  for  a  feasible  policy  (R*,L*)  that  we  want  to  test  for 
optimality.  The  left  side  of  (4.12)  gives  the  expected  discounted 
cost  when  we  use  an  alternate  policy  (R,L)  up  to  time  T  and  employ 
(R*,L*)  thereafter,  with  T  playing  the  role  of  time  zero  and  Zy 
viewed  as  the  initial  state  of  the  control  problem.  (To  make  this 
last  phrase  precise,  one  must  introduce  shift  operators.)  For  (R*, 
L#)  to  be  an  optimal  policy,  it  is  necessary  and  sufficient  that  all 
such  attempts  to  improve  (R*,L*)  through  hybridization  fail,  meaning 
that  (4.17)  holds  for  every  x,  every  stopping  time  T,  and  every 
feasible  policy  (R,L).  Combining  this  with  (4.12),  it  can  be  shown 
that  (4. 14)-(4. 16)  are  necessary  and  sufficient  for  the  optimality  of 
(R*,L*).  Finally,  from  (4.12)  and  the  fact  that  f  is  (by 
assumption)  the  cost  function  for  a  feasible  policy,  it  can  be  shown 
that  at  least  one  of  the  inegualities  (4.14)  -  (4.16)  is  tight  at  each 
point  x  c  S,  meaning  that  if  satisfies  the  optimality  equation 

(4.19)  0  =  [(Tf-vf+h)  a  (r+f')  a  (*-f')](x)  ,  x  c  S  , 

which  appeared  earlier  as  (1.1).  To  repeat,  if  f  e  fc?2(S)  is  the 
cost  function  for  a  feasible  policy,  then  (4.19)  is  necessary  and 
sufficient  for  the  optimality  of  that  policy,  but  only  the  sufficiency 
has  been  proved  rigorously. 
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5.  Control  Limit  Policies 

Let  a  and  b  be  fixed  throughout  this  section,  with  a  £  a  <  b 
£  P.  We  want  to  construct  the  policy  (R,L)  that  enforces  these 
control  limits,  and  then  calculate  the  associated  cost  function. 

These  are  essentially  known  results  for  one-dimensional  Brownian 
motion  with  reflecting  barriers,  but  we  do  not  know  of  any  textbook 
treatment  that  presents  them  in  a  form  suitable  for  our  purposes. 

(5.1)  Proposition.  For  each  e  0.  there  exists  a  unique  pair  of 
functions  R(w)  =  {R^(w),  t  £  0}  and  L( to)  =  {L^(w),  t  £  0} 
which  jointly  satisfy 

(5.2)  R  (w)  =  sup  ta-X  (to)  +  L  (w)]+  ,  t  >  0  , 

t  0£s£t  S  S 

(5.3)  L  (u>)  =  sup  [X  (w)  R  (w)  -  b]+  ,  t  £  0  . 

0£s£t  s  s 

Both  R(w)  and  L(w)  are  continuous  and  non -decreasing,  with  Ro(w) 
=  [a-Xo(w)]+  and  Lo(w)  =  [Xo(w)-b]+. 

Proof .  Let  us  first  prove  the  last  statement,  taking  R  and  L  to 
be  any  two  functions  which  jointly  satisfy  (5.2)  -  (5.3).  (The 
dependence  on  w  will  be  suppressed  throughout  this  proof.)  If  Ro 
and  Lo  were  both  positive,  then  we  would  have  Ro  =  a-Xo+Lo  by 
(5.2)  and  Lo  =  Xo+Ro-b  by  (5.3),  which  implies  a  =  b,  a  contradic¬ 
tion.  In  exactly  the  same  way,  If  AR^  >  0  and  AL^  >  0  for  some 
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t  >  0,  then  the  suprema  in  (5.2)  and  (5.3)  would  both  be  achieved  at 


s  =  t,  implying  =  a-X^+L^  and  =  X^+R^-b,  and  again  we  arrive 
at  the  contradiction  a  =  b.  So  R  and  L  cannot  jump 
simultaneously,  and  then  (5.2)-(5.3)  and  the  continuity  of  X  imply 
that  they  have  no  jumps  at  all. 

We  now  construct  a  solution  of  (5.2)-(5.3)  by  successive  approxi¬ 
mations.  Beginning  with  the  trial  solution  R°  =  L°  =  0,  t  0,  let 

(5.4)  R"+1  =  sup  [a-X  +Ln]+  , 

0<s<t  s  s 

and 

(5.5)  l"+1  =  sup  [X  +Rn-b]+  , 

1  0<s<t  5  5 

for  n  =  0,  1,  ...  and  t  0.  Observe  that  R*  R°  and  _>  L°, 
and  hence  (by  Induction)  that  r"  and  are  increasing  in  n  for 

each  fixed  t.  So  we  have 

(5.6)  R^  +  R^  and  +  L^.  as  n  t  •,  t  >  0  , 

and  one  can  easily  verify  that  the  convergence  in  (5.6)  is  obtained  in 
a  finite  number  of  iterations  for  each  fixed  t.  Thus  R  and  L  are 
finite  valued  and  jointly  satisfy  (5.2)-(5.3). 

For  uniqueness,  let  (R',L')  be  another  (distinct)  solution  of 
(5.2)-(5.3).  We  have  already  seen  that  R'  and  L'  must  both  be 


continuous  with  Rg  =  Rg  and  Lg  =  Lg,  and  it's  obvious  from  the 
construction  above  that  R'  ^  R  and  L'  ^  L.  Let  T  >  0  be  the 
infinium  of  those  t  >  0  at  which  either  Rj.  >  R^  or  L^  >  L^.  By 
continuity,  we  have  Ry  =  Ry  and  L|  =  Ly,  and  either  R'  or  L' 
must  increase  at  T.  If  T  were  a  point  of  increase  for  both,  then 
(5.2)  and  (5.3)  would  give  us  Ry  =  a-Xy+Ly  and  Ly  =  Xy+Ry-b 
respectively,  which  yields  the  contradiction  a  =  b.  So  we  conclude 
that  exactly  one  of  the  pair  (R',L')  increases  at  T.  Suppose  it  is 
R',  implying  that  L'  =  L  over  [0,  T+e]  for  sufficiently  small  e. 
But  then  R'  =  R  over  [0,  T+e]  by  (5.2),  which  contradicts  the 
definition  of  T.  In  the  same  way,  we  cannot  have  R'  flat  and  L' 
increasing  at  T,  so  the  proof  of  unigueness  is  complete. 

(5.7)  Proposition.  Let  R(w)  and  L(w)  be  as  In  (5.1),  and  set 
Z  =  X+R-L.  The  processes  R,  L  and  Z  are  adapted  and  satisfy 

(5.8)  a  <  Zt  _<  b  ,  t^O, 

t 

(5.9)  f  (Z  -a)dR  =  0  ,  t  >  0  , 

0  s  s 

t 

(5.10)  f  (b-Z  )dL  =  0  ,  t  >  0  . 

0  s  s 

Remark.  One  may  paraphrase  (5.9)  by  saying  that  R  increases  only 
when  Z  =  a.  With  our  usual  convention,  (5.9)  yields  (Zo-a)Ro  =  0 
when  specialized  to  t  =  0.  Similar  statements  hold  for  (5.10). 


Proof.  The  adaptedness  Is  immediate  from  our  construction  (5. 4) -(5. 6) 
of  R  and  L,  while  (5.8)-(5.10)  follow  directly  from  (5.2)-(5.3). 

It  can  further  be  shown  that  the  unique  pair  of  functions  (R,L) 
satisfying  (5.8)-(5.10) ,  with  Z  =  X+R-L,  is  that  constructed  in  the 
proof  of  (5.1).  This  means  that  the  characterizations  of  (R,L) 
given  in  Propositions  (5.1)  and  (5.7)  are  completely  equivalent.  We 
observed  earlier  that  the  convergence  (5.6)  is  obtained  in  a  finite 
number  of  iteations  for  each  fixed  t,  which  makes  it  possible  to 
write  out  a  general  (and  very  messy)  recursive  formula  for  R  and  L 
in  terms  of  a  sequence  of  stoppinq  times  (Tn>.  This  was  done  in 
[6]  for  the  case  a  =  0,  but  the  only  relevant  properties  of  the 
resulting  pair  (R,L)  are  those  expressed  in  (5.7).  It  can  be  shown 
that  R  (respectively  L)  is  the  local  time  of  the  diffusion  process 
Z  at  the  boundary  a  (respectively  b),  but  we  shall  have  no  need 
for  this  fact. 

(5.11)  Proposition.  Suppose  that  k  c’g’^S)  is  twice  continuously 
differentiable  on  [a,b]  and  satisfies 

(5.12)  rk(x)  -  yk(x)  +  h(x)  =0  ,  a  £  x  £  b  , 

(5.13)  k'(x)  +  r  =  0  ,  a  <  x  <  a  f 
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(5.14)  k' (x)  -1  =  0,  b  _<  x  <  P  . 

Then 

00 

k(x)  =  Ev{f  e"Yt[h(Z.)dt  ♦  rdR.  +  idL  ] }  ,  x  c  S  . 

X  Q  ^  Z  Z 

Remark .  This  of  course  shows  that  there  is  at  most  one  k  satisfying 
the  stated  hypotheses,  and  we  shall  exhibit  a  solucion  (or  rather  the 
solution)  shortly. 

Proof .  First  fix  a  starting  state  x  e  [a,b].  Defining  the 
cumulative  discounted  cost  Kt  as  in  (4.11),  we  need  to  prove  that 
ExtKa,)  =  k(x).  Fixing  T  >  0,  we  shall  apply  Corollary  (4.12) 
with  k  e  ^2[a,b]  replacing  f  e  ^2[o,P].  Since  L  and  R  have 
no  jumps  when  X0  =  x,  we  have  p  =  R  and  X  =  L,  and  (4.12)  yields 

(5.15)  Ex[Kt  +  e'rT  k(ZT) J 

T  _ 

=  k(x)  +  E  {(  e  Yt(rk-vk+h)(Z  )dt) 
x  o  t 

T 

+  E  { f  e"Yt[r+k'(Z. )]dR. 1 
x  o  z  z 

T  . 

+  E  { f  e'Yt[l-k'(Z.)]dL.)  . 

x  o  z  z 
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Since  Px(a  <  h  <  b  f°r  t  ^  0)  =  1,  the  second  term  on  the 
right  side  of  (5.15)  vanishes  by  (5.12).  Next,  (5.9)  says  that  R 
increases  only  when  Z  =  a,  so  the  third  term  on  the  right  side  of 
(5.15)  is 

T  . 

Elf  e'vt[r+k'(a)]dRj  , 
x  o  t 


which  vanishes  by  (5.13).  Similarly,  the  final  term  on  the  right  side 
of  (5.15)  vanishes  by  (5.10)  and  (5.14).  Letting  T  -*•  ®  in  (5.15), 
and  using  the  boundedness  of  k(ZT),  we  thus  obtain  E  (K  )  =  k(x)  as 
desired. 

If  a  =  a  and  b  =  8,  there  Is  nothing  left  to  prove.  Next 
suppose  b  <  8  and  consider  a  starting  state  x  <r  (b,8].  From  the 
construction  of  (R,L)  we  have 

(5.16)  E  (K  )  =  EUL0)  ♦  E.  (K  ) 

X  ®  X  D  00 

=  Jt(x-b)  +  E.  (K  )  . 

D  ® 

The  first  part  of  the  proof  shows  that  E^Ks)  =  k(b),  and 
A(x-b)+k(b)  =  k(x)  by  (5.14),  so  (5.16)  reduces  to  EX(K„,)  =  k(x) 
as  desired.  A  similar  argument,  using  (5.13),  gives  EX(K»)  =  k(x) 
for  a  _<  x  <  a,  which  completes  the  proof. 

We  conclude  the  section  by  constructing  a  solution  for  the 
ordinary  differential  eguation  (5. 12)-(5. 14).  To  emphasize  the 


dependence  of  this  solution  on  the  control  limits  a  and  b,  we 
denote  it  kab(x).  First,  let  g  e  «’2(S)  be  the  unique  solution 
of 

(5.7)  Tg(x)  -  y o(x)  +  b(x)  =  0  ,  a£x^P, 

(5.8)  g(a)  =  g(P)  =  0  . 

It  is  well  known  that  exactly  one  such  g  exists,  and  it  can  be 
written  explicitly  as  an  integral  involving  a  known  Green's  function. 
Next,  setting 

Pi  =  [-p  +  (p2+2yo2)^2]/o2  , 

fi2  =  [-P  +  (p2+2yo2  )VZVJ  , 

ci  =  exp(02a)/fli[exp(0ib+fl2a)  "  exp( Pia+02b)]  » 
c2  =  exp(Pia)/02[exp(flia+fl2b)  "  exp( 0ib+02a)]  » 
di  =  exp(02b)/0i[exp(0ia+02b)  -  exp( Pib+fl2a)]  , 
d2  =  exp(0ib)/P2[exp(©ib+P2a)  -  exp(0ia+02b)]  , 

we  define 
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$1  (x)  =  ci  e^ix  +  C2  e°2X  ,  a  x  £.  b  * 

$2(x)  =  di  e6lX  +  d2  e^2*  ,  a  ^  x  ^  b  , 

<Kx)  =  g(x)  +  [*-g’(b)3  d>i (x )  -  [r+g'(a)3  ^(x)  ,  a  <.  x  <  b  , 

and  finally 

a  £  x  S  a  » 
a  S  x  —  &  > 

b  _<  x  £  P  . 

One  can  easily  verify  that 

F<bi (x )  -  y<Mx)  =  T’«t>2 <x )  _  Y*2(x)  =  0  »  a  x  b  , 


<b(a)  +  (a-x)r  , 


kab<x)  =  \  d-U)  , 


4>(b)  +  (x>b)t  , 


<t>  1  (a)  =  <t>2 ( )  -  0  and  #[(b)  =  d>2(a)  =  1  • 

From  this  and  (5.7)-(5.8)  it  follows  directly  that  kab  is  the 
desired  solution  of  (5.12)-(5.14) .  Observe  that  k^  is  continuous 
on  all  of  [a,p]  and  that  k^  is  continuous  everywhere  except 
possibly  at  the  control  limits  a  and  b. 

In  this  construction  of  kab  one  can  start  with  any  function 
g  c  ^P2(S)  satisfying  the  main  equation  (5.7).  For  concreteness  we 
have  specified  one  particular  solution  via  the  boundary  conditions 
(5.8).  For  future  reference,  note  that  our  choice  of  g  does  not 
depend  on  the  control  limits  a  and  b. 
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6.  Optimal  Control  Limits 

Maintaining  the  notation  of  the  previous  section,  let  f  =  k  ^ 
be  the  cost  function  for  a  control  limit  policy  with  a  £  a  <  b  ^  R. 
Recall  that  f  Is  continuously  differentiable  on  S  and  twice 
continuously  differentiable  except  possibly  at  a  and  b.  We  would 
like  to  find  conditions  (on  a  and  b)  under  which  f  satisfies  the 
optimality  conditions  (4.14)-(4.16). 

The  first  thing  to  note  Is  that  f  can  only  satisfy  (4.14)-(4.16) 
If  it  Is  twice  continuously  differentiable,  meaning  that  f"(x)  ♦  0 
as  x  T  a  If  a  <  a,  and  f"(x)  ♦  0  as  x  ♦  b  If  b  <  P.  To  see 
this,  suppose  first  that  a  <  a  and  that  f"(a+)  >  0.  Since 
(rf-yf+h)(a+)  =  0,  and  f,  f'  and  h  are  continuous  at  a  while 
f"(a-)  :  0,  we  have  ( rf-yf+h)(a -)  <  0,  which  violates  (4.74).  Now 
suppose,  on  the  other  hand,  that  f"(a+)  <  0.  Since  f  is  contin¬ 
uous  at  a  with  f'(a)  =  -r,  this  implies  f'(a+e)  <  -r  for  e  >  0 
sufficiently  small,  which  violates  (4.15).  So  continuously  of  f" 
at  the  lower  control  limit  is  necessary  if  f  is  to  satisfy  (4.14)- 
(4.16),  and  a  similar  analysis  holds  at  the  upper  control  limit.  We 
turn  now  to  the  matter  of  sufficiency. 


(6.1)  Proposition.  Suppose  that  f  =  ka^  is  twice  continuously 
differentiable  on  S  with  -r  £  f'  _<  t.  Then  f  satisfies  the 
optiality  conditions  (4.14)-(4.16). 


Proof.  If  a  =  a  and  b  =  B,  the  conclusion  Is  automatic.  Suppose 
then  that  a  >  a.  Defining 

<t»(x)  =  Tf(x)  -  yf(x)  +  h(x) 

=  j  a2  f"(x)  +  pf'(x)  -  yf(x)  +  h(x)  ,  x  €  S  , 

we  have  <|>(x)  =  0  for  a  _<  x  _<  b  and  need  to  prove  that  <J»(x)  ^  0 
for  a  £  x  £  a.  (The  proof  that  <l>(x)  ^  0  for  b  £  x  <  fl  if  b<fl 
is  virtually  identical,  so  we  delete  it.)  Let  us  define 


4>(x)  =  [h(x)  -  h(a)]  +  yr(x-a)  , 

e(x)  =  [rf(x)-rf(a)]  -  y[f(x)-f(a)]  +  yf ' ( a )  (x-a)  . 

Remembering  that  4>(a)  =  0,  rf(x)  =  rf(a)  for  a  <  x  ^  a,  f(x) 

=  f(a)+r(a-x)  for  a  £  x  <_  a,  and  f'(a)  =  -r,  we  then  have 

!<t>(x)  ,  if  a  <_  x  £  a 

$(x)+fl(x)  ,  if  a  S.  *  S.  P 

Next,  since  f  e  W2(S)  by  assumption,  f"(a)  =  0,  f ’ ( a)  =  -r  and 
f(x)  _>  ’r  for  all  *  by  assumption,  it  must  be  that  f"(a+F)  _>  0 
for  all  F  _>  0  sufficiently  small.  From  Taylor's  theorem  and  the 
definition  of  0(»),  we  then  have  the  following:  for  each  6  >  0 
there  exists  an  c  >  0  such  that 
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(6.3)  0(x)  -(x-a)6  ,  for  a  x  _<  a+e  . 

But  4>(x )  =  0  for  a  £  x  _<  b,  so  (6.2)  and  (6.3)  together  Imply 

(6.4)  6(x)  (x-a)6  ,  for  a  <  x  £  a+e  . 

Convexity  of  h  implies  convexity  of  6,  and  obviously  6(a)  =  0,  so 

(6.4)  implies 

(6.5)  6(x)  -(a-x)fi  ,  for  a  <  x  <  a 


Since  6  >  0  was  arbitrary,  this  gives  6(x)  0  for  <*  <.  x  £  a  and 

hence  4>(x)  >_  0  for  a  £  x  £  a  by  (6.2).  This  completes  the  proof 
of  the  proposition. 

We  now  construct  a  control  limit  policy  whose  cost  function 
satisfies  the  hypotheses  of  Proposition  (6.1).  For  each  a  e  [a, R) 
let 

b*(a)  =  sup(b  e  (a,P]s  k^b(x)  *  *  a  £  x  b)  . 


From  the  explicit  formula  for  kab  given  in  $5,  it  follows  easily 
that  b*(a)  >  a.  Hereafter  let  ka  s  kal)*(a)  for  a  e  [o,  P). 

Next  define 
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a*  =  Inf {a  c  [a,fl):  k'(x)  >  -r,  a  <  x  <  b*(a)}  . 

d 

Again  It  follows  easily  from  the  formulas  of  §5  that  a*  <  B. 

Hereafter  we  set  b*  =  b*(a*). 

(6.6)  Proposition.  The  cost  function  f  =  ka*b#  satisfies  the 
hypotheses  of  (6.1),  and  thus  the  control  limit  policy  with  parameters 
a*  and  b*  is  optimal. 

Proof.  The  inequality  -r  f '  J*  is  immediate  from  our  construc- 


tion. 

that 

It  remains  to 

show 

that  f" 

is  continuous, 

which 

means  simply 

(6.7) 

f"(x) 

-  0 

as 

x  +  a*  if 

a*  > 

a  » 

(6.8) 

f"(x) 

-►  0 

as 

x  t  b*  if 

b*  < 

B  . 

From  the  formulas  of  §5  it  is  immediate  that  ka^  and  its  first  two 
derivatives  vary  continuously  with  the  parameters  a  and  b,  and  it 
is  this  continuity  plus  the  definitions  of  a*  and  b*  that  one  uses 
in  verifying  (6. 7)-(6.8).  The  verification  is  straightforward  but 
tedious,  so  we  leave  it  as  an  exercise.  Propositions  (6.1)  and  (4.18) 
give  f  _<  k  for  any  cost  function  k  associated  with  a  feasible 
policy,  which  completes  the  proof  of  the  proposition. 
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The  approach  that  we  have  taken  to  determining  an  optimal  policy 
does  not  work  directly  with  the  optimality  equation  (1.2),  but  the 
return  function  for  our  optimal  control  limit  policy  does  in  fact 
satisfy  this  relationship.  Defining  f  =  ka*b*  as  in  (6.6),  we 
have  seen  that  f  is  the  minimal  cost  function  and  that  it  satisfies 
each  of  the  inequalities  (4.14)-(4.16).  Furthermore,  from  the 
definition  and  construction  of  kab  given  in  §5,  we  see  that  (4.14) 
holds  with  equality  on  [a*,b*],  (4.15)  holds  with  equality  on  [a, a* 
and  (4.16)  holds  with  equality  on  [b*,p].  Thus, 

(6.9)  0  =  [rf(x)-yf(x)+h(x)]  a  [r+f'(x)]  a  [i-f'(x)],  x  e  S  , 

as  claimed.  Adding  the  boundary  conditions 

(6.10)  r+f '(a)  =  0  and  Jl-f'(B)  =  0  , 

we  believe  that  the  unique  function  f  e  ^?2(S)  satisfying  (6.9)- 

(6.10)  is  the  minimum  cost  function,  but  we  have  not  attempted  to 
prove  this.  The  boundary  conditions  (6.10)  are  essential,  inciden¬ 
tally,  since  there  may  exist  f  c  #2(S)  satisfying  rf-yf+h  =  0  on 
S  with  r+f'(x)  >  0  for  all  x  c  S  or  i-f’(x)  >  0  for  all  x  e  S 
Such  an  f  satisfies  (6.9)  but  lies  strictly  below  the  minimum  cost 
function. 
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7.  Concluding  Remarks 


Two  potential  areas  of  application  for  our  instantaneous  control 
problem  are  cash  management  and  production  control.  See  Harrison  and 
Taylor  [6]  for  a  discussion  of  these  applications,  oriented  toward  the 
specific  holding  cost  function  (1.1),  and  further  references.  For 
cash  management  problems,  the  non-llnear  (but  convex)  holding  cost 
function 

!hx  ,  If  x  2  0  (h  >  0) 

q  |  x  |  ,  if  x  <  0  (q  )  0) 

is  also  of  practical  importance.  This  arises  when  the  firm  can 
maintain  a  negative  cash  balance  through  short-term  borrowing,  and  a 
similar  holding  cost  structure  occurs  in  production  control  problems 
where  demand  can  be  backlogged  at  some  penalty  cost.  Motivated  by  the 
stochastic  cash  management  problem,  Constantinides  and  Richard  [3] 
have  studied  the  optimal  control  of  Brownian  Motion  when  the  holding 
cost  function  Is  (7.1)  and  there  are  both  fixed  and  proportional  costs 
of  control.  This  gives  a  problem  of  optimal  Impulse  control  [2],  and 
they  show  the  existence  of  an  optimal  policy  characterized  by  four 
critical  numbers.  The  applications  of  Instantaneous  control  in 
production  and  inventory  theory  will  be  further  developed  In  [7], 
using  the  results  of  this  paper. 

Two  problems  of  Instantaneous  control,  closely  related  to  ours 
but  much  more  difficult,  have  been  solved  in  a  beautiful  recent  paper 


by  Benes,  Shepp  and  Witsemhausen  [1].  Using  our  notational  system, 
one  of  their  problems  can  be  stated  as  follows:  Find  a  pair  of 
controls  (R,L)  to  minimize 


f 


Er  r  e'at(X.+R. -L. )2  dtl  , 

0  z  z  z 

subject  to  the  constraint  that  R«  +  L®  £  y  <  ®.  Here  one  has  no 
explicit  cost  of  control,  but  there  is  a  finite  limit  on  the  total 
amount  of  control  that  can  be  exerted  over  the  infinite  planning 
horizon.  The  authors  take  essentially  the  same  approach  employed 
here,  using  martingale  methods  (the  generalized  Ito  formula)  to  verify 
optimality  of  a  candidate  policy  arrived  at  from  certain  heuristic 
considerations.  Their  optimal  policy  has  a  much  more  complex  form 
that  ours,  however,  so  the  argument  is  much  more  intricate.  This  is 
the  only  previous  paper  we  know  of,  other  than  [6],  which  explicitly 
considers  an  instantaneous  control  problem  for  Brownian  motion,  as 
opposed  to  control  at  a  bounded  rate  or  optimal  impulse  control. 
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